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ABSTRACT 

When  a  person  seeks  another  person's  attention,  it  is  of  prime  importance  to  assess  how 
interruptible  the  other  person  is.  Since  smartphones  are  ubiquitously  used  as  communication 
media  these  days,  interruptibiiity  prediction  on  smartphones  has  started  to  attract  great 
interest  from  both  academia  and  industry.  Previous  studies,  in  general,  attempted  to  model 
interruptibiiity  using  the  behaviors  at  the  current  moment  and  in  the  immediate  past  (e.g.,  5 
minutes  before).  However,  a  person's  interruptibiiity  at  a  certain  moment  is  indeed  affected 
by  his/her  preceding  behaviors  for  several  reasons.  Motivated  by  this  long-term  effect,  in  this 
project  we  propose  a  novel  methodology  of  extracting  features  based  on  past  behaviors  from 
smartphone  sensor  data.  The  primary  difference  from  previous  studies  is  that  we 
systematically  consider  a  longer  history  of  up  to  a  day'vn  addition  to  the  current  point  and  the 
immediate  past.  To  represent  behaviors  in  a  day  accurately  and  compactly,  our  methodology 
divides  a  day  into  multiple  timeslots  and  then,  for  each  timeslot,  derives  relevant  features  such 
as  the  temporal  shapes  of  the  time  series  of  the  sensor  data.  In  order  to  verify  the  advantage 
of  our  methodology,  we  collected  a  data  set  of  smartphone  usage  from  25  participants  for 
four  weeks  and  obtained  a  license  to  a  large-scale  public  data  set  constructed  from  907  users 
over  approximately  9  months.  The  experimental  results  on  the  two  data  sets  show  that  looking 
back  to  the  beginning  of  the  current  day  improves  prediction  accuracy  by  up  to  13%  and  8%, 
respectively,  compared  with  the  baseline  and  state-of-the-art  methods. 


INTRODUCTION 

Human  interruptibiiity,  simply  interruptibiiity,  in  general  is  defined  by  the  degree  of  how 
opportune  it  is  to  interrupt  a  person  [15].  The  probability  of  replying  to  an  instant  message  or 
checking  a  notification  at  a  particular  moment  is  a  typical  example  of  interruptibiiity.  Then, 
interruptibiiity  prediction  is  to  assess  another  person's  interruptibiiity  prior  to  interaction  with 
him/her  [8,  14].  With  accurate  prediction,  we  can  expect  a  quick  and  high-quality  response  to 
an  interruption,  and  the  cognitive  burden  of  the  person  interrupted  is  reduced  significantly 
[11].  Thus,  the  importance  of  interruptibiiity  prediction  is  being  widely  recognized  since  it  is 
beneficial  for  both  those  who  interrupt  and  those  who  are  interrupted  [28].  Interruptibiiity 
prediction  has  been  extensively  studied  in  various  scenarios:  office  environments  [8,  14], 
desktop  computers  [12,  13],  and  mobile  devices  [18,  20,  21,  22,  23,  24,  27].  In  particular, 
owing  to  the  prevalence  of  mobile  devices  [4,  25,  27],  huge  amounts  of  research  effort  are 
currently  being  devoted  to  interruptibiiity  prediction  on  mobile  devices  —  smartphones. 
Previous  studies  have  demonstrated  that  interruptibiiity  can  be  predicted  fairly  well  (with  an 
accuracy  of  over  70%)  using  various  types  of  context  information.  One  of  the  active  topics  in 
this  direction  is  to  determine  what  data  to  capture  to  represent  the  current  context  [19,  28], 
because  the  advances  in  ubiquitous  sensing  technology  provide  us  with  abundant  contextual 
data.  Regardless  of  data  sources,  one  dominating  assumption  is  that  the  current  context  can 
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best  be  modeled  by  the  observations  obtained  at  that  exact  point  in  time.  That  is,  previous 
studies  attempted  to  represent  interruptibility  mostly  using  the  "present-time"  features 
captured  at  the  specific  moment.  Examples  of  these  features  include  the  current  ringer  mode 
and  the  current  screen  on/off  status.  In  addition  to  these  features,  more  recent  studies  have 
started  considering  users'  past  behaviors,  such  as  events  occurring  in  the  previous  one  or  five 
minutes  [8,  14,  24]  and  the  time  elapsed  since  the  last  event  [20,  21,  22].  This  is  reasonable 
because  the  consequences  of  past  behaviors  and  history  are  part  of  the  current  context  [9]. 
Nonetheless,  these  studies  still  do  not  consider  past  behaviors  extensively  since  they  reflect 
only  the  immediate  past. 

In  this  research,  we  tackle  the  problem  of  systematically  incorporating  past  behaviors 

into  interruptibility  prediction.  Differing  from  existing  research  that  considers  only  the 
present  time  and  the  immediate  past,  our  methodology  considers  a  longer  history  of  up  to  one 
day.  Here,  we  contend  that  proper  consideration  of  past  behaviors  plays  a  key  role  in  accurate 
prediction.  The  intuition  behind  our  methodology  is  two-fold  as  shown  below. 

•  Self-regulation:  People  consciously  manage  or  guide  their  own  thoughts  and  behaviors 
[16].  In  addition,  the  amount  of  human  activity  per  day  is  in  fact  limited  and  conserved  [31]. 
Thus,  for  example,  if  a  person  did  not  concentrate  on  work  in  the  morning,  the  person  would 
probably  work  harder  in  the  afternoon  to  finish  a  planned  task  within  the  day. 

•  Prolongation:  The  effect  of  an  event  could  last  for  a  long  time  [1].  Hence,  for  example,  if 
a  phone  call  to  someone  makes  a  caller  feel  relieved,  the  caller  is  more  willing  to  do  a  favor 
after  the  phone  call  during  the  entire  day. 

Improving  interruptibility  prediction  based  on  past  behaviors  is  challenging.  First,  it  is 
important  to  determine  how  far  back  we  need  to  look.  We  empirically  verify  that  looking  back 
on  the  current  day  is  sufficient  to  achieve  the  best  result.  Furthermore,  since  a  temporal 
window  is  relatively  long  (i.e.,  from  several  hours  to  a  day),  a  novel  approach  to  feature 
extraction  from  smartphone  usage  data  is  needed  for  effective  prediction.  We  carefully  derive 
relevant  features  that  include  the  statistical  measures,  value  distributions,  and  temporal 
shapes  of  the  time  series  of  smartphone  sensor  data.  Figure  1  shows  the  concept  of  our 
methodology. 


Figure  1.  The  main  concept  of  our  methodology. 


In  addition  to  the  features  extracted  from  the  present  time  and  the  immediate  past,  those 
extracted  from  the  current  moment  back  to  the  beginning  of  a  day  axe.  provided  to  the  feature 
selection  module.  Many  more  features  are  derived  from  the  today  window  than  from  the 
current  point  and  the  immediate-past  window  because  of  its  longer  duration.  Finally,  only  the 
discriminative  features  resulting  from  the  feature  selection  module  are  used  for  training  and 
prediction.  We  note  that  our  work  is  orthogonal  to  existing  work  that  explores  predictive  data 
sources  (e.g.,  [17]).  Our  sophisticated  design  supports  any  time-series  data  of  numeric,  binary, 
and  nominal  variables.  Thus,  given  a  set  of  attributes,  we  attempt  to  maximize  the  benefits  of 
those  attributes  by  harnessing  daily  behaviors. 
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RESEARCH  QUESTIONS 

Our  research  questions  for  this  project  are  as  follows. 

•  RQ1:  Interruptibility  is  affected  by  the  immediate  past  behavior  as  well  as  the  current  status. 

•  RQ2:  The  accuracy  of  interruptibility  prediction  improves  significantly  when  using  the 
behavior  of  the  current  day. 

•  RQ3:  Looking  further  back  beyond  the  current  day  is  not  very  helpful  for  interruptibility 
prediction. 

•  RQ4:  The  behavior  of  the  target  day  can  be  replaced  with  that  of  a  preceding  day  without 
reducing  much  accuracy. 


INTERRUPTIBILITY  DATA  SETS 

We  used  two  real-world  smartphone  usage  data  sets:  the  KAIST  data  set  and  the  Device 
Analyzer  data  set  [29].  The  former  is  our  proprietary  data  set,  and  the  latter  is  a  public  data 
set.  Especially,  the  Device  Analyzer  data  set  is  known  to  be  the  largest  collection  of 
smartphone  usage  data,  and  we  extracted  907  users  for  use  in  our  experiment  in  the  order  of 
the  number  of  instances  recorded.  In  both  data  sets,  hour  of  day  and  day  of  week  were 
attached  to  every  recording.  Interruptibility  is  modeled  as  a  binary  state  as  typically  done  by 
recent  studies  [18,  21,  22,  24].  Table  1  shows  the  general  statistics  of  the  two  data  sets.  The 
first  column  represents  the  number  of  attributes  which  will  be  detailed  in  Tables  2  and  3 
respectively.  The  total  number  of  interruptibility  labels  (interruptible  or  non-interruptible)  is 
reported  in  the  last  column. 


Table  1.  Statistics  of  the  two  data  sets. 


Data  Set 

#  Attributes. 

#  Users 

#  Labels 

KAIST 

24 

25 

4,103 

Device  Analyzer 

26 

907 

1,870,315 

KAIST  Data  Set 

Participants 

We  conducted  a  field  study  with  25  participants  who  installed  our  own  data -col lection 
application  and  reported  their  data  for  four  weeks.  The  goal  of  this  study  was  to  obtain  not 
only  smartphone  usage  data  but  also  the  ground  truth  on  the  participants'  interruptibility 
through  experience  sampling.  Among  25  participants,  5  were  recruited  from  our  department, 
and  20  were  from  an  online  community.  For  the  former  group,  we  personally  asked  them  to 
join  the  experiment;  for  the  latter  group,  we  posted  a  wanted  advertisement  on  the  online 
community  and  selected  20  eager  users  from  the  applicants.  Then,  we  provided  all  participants 
with  the  detailed  instructions  and  received  explicit  consent  from  them  before  the  experiment. 
After  the  experiment  was  complete,  we  paid  about  US$100  to  each  participant.  This  study  was 
approved  by  the  KAIST  institutional  review  board  (IRB). 

Data  Collection 

All  participants  downloaded  the  data-collection  application  from  Google  Play  and  installed  it 
on  their  Android  smartphone.  Our  application  supports  Android  4.0  or  higher.  Table  2  shows 
the  attributes  that  the  application  collected  in  the  background.  Each  participant  sent  us  his/her 
weekly  data  at  the  end  of  each  week,  and  we  verified  the  data  to  give  him/her  feedback  on 
the  quality  of  the  data.  We  did  not  collect  any  personal  information  that  can  be  used  for 
inferring  the  data  owner  as  per  the  recommendation  of  the  IRB.  This  data  collection  was  run 
for  four  weeks  from  February  2015  to  March  2015. 
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Table  2.  Attributes  collected  in  tl 

he  KAIST  dat 

:a  set. 

Attribute 

Description 

Type 

# Instances 

cpu 

CPU  usage 

numeric 

67,590 

batjev 

Battery  level 

numeric 

242,560 

bat_temp 

Battery  temperature 

numeric 

242,560 

cell_strn 

Cellular  signal  strength 

numeric 

95,071 

wifi_strn 

WiFi  signal  strength 

numeric 

40,482 

ill 

Ambient  light  level 

numeric 

37,093 

accel_x 

Acceleration  force  (X-axis) 

numeric 

119,347 

accel_y 

Acceleration  force  (Y-axis) 

numeric 

119,347 

accel_z 

Acceleration  force  (Z-axis) 

numeric 

119,347 

accel tot 

Acceleration  force  (total) 

numeric 

119,347 

airplane 

Airplane  mode  on/off 

binary 

67,590 

screen 

Screen  on/off 

binary 

67,590 

headset 

Headset  mode  on/off 

binary 

67,590 

cell 

Cellular  mode  on/off 

binary 

95,071 

wifi 

WiFi  mode  on/off 

binary 

40,482 

charge 

Charge  mode  on/off 

binary 

242,560 

ringtone 

Ringtone  mode 

nominal 

67,590 

charge_stat 

Charge  status 

nominal 

242,560 

ssid 

Connected  WiFi  SSID 

nominal 

40,482 

app_pkg 

Application  package  name 

nominal 

264,520 

app_cat 

Application  category 

nominal 

264,520 

location 

Location  name  (district) 

nominal 

52,744 

call 

Phone  call  event 

nominal 

3,530 

sms 

Message  event 

nominal 

4,964 

Experience  Sampling 

We  collected  the  ground-truth  information  about  the  participants'  state  of  interruptibility  via 
experience  sampling  [5].  The  experience  sampling  method  (ESM)  is  a  signal-contingent 
method  of  data  collection  from  participants  about  their  current  experience  or  situation.  In  our 
case,  we  collected  in-situ  self-reports  on  the  subjective  state  of  interruptibility. 

A  notification  in  Figure  2  popped  up  —  five  times  a  day  randomly  between  9  a.m.  and  10  p.m. 
—  per  trigger  from  our  server.  The  notification  asked  the  participants  to  answer  the  question 
with  "Yes"  or  "No".  All  participants  were  explained  about  the  meaning  of  the  question:  "you 
are  interruptible  if  you  are  willing  to  do  a  simple  task  by  spending  less  than  ten  minutes  right 
now." 


Are  you  interruptible? 

YES  NO 

Figure  2.  Screen  capture  of  the  experience  sampling  probe. 

A  participant's  interruptibility  was  recorded  together  with  temporal  information  (e.g.,  time  and 
date)  when  he/she  responded  to  a  question.  If  a  participant  did  not  respond  within  ten  minutes 
after  receiving  a  question,  we  recorded  his/her  status  as  "not  interruptible"  at  that  time. 
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Device  Analyzer  Data  Set 

Description 

The  Device  Analyzer  project  is  being  maintained  by  the  University  of  Cambridge1.  Its  data  set 
contains  over  100  billion  records  of  Android  smartphone  usage  from  over  17,000  devices 
across  the  world,  which  is  known  to  be  the  largest  collection  of  smartphone  usage  data.  We 
officially  obtained  a  license  from  the  University  of  Cambridge  and  downloaded  the  snapshot 
of  the  data  set  as  of  November  2015.  The  size  of  the  raw  data  reached  around  7.5  terabytes. 
While  the  project  collects  more  than  50  attributes,  we  selected  the  26  attributes  that 
correspond  to  those  of  the  KAIST  data  set.  Table  3  lists  all  the  attributes  used  in  this  project. 


Tab 

e  3.  Attributes  used  in  the  Device  Analyzer  data  set. 

Attribute 

Description 

Type 

#  Instances 

batjev 

Battery  level 

numeric 

180,018,744 

bat temp 

Battery  temperature 

numeric 

180,018,744 

vol music 

Media  (music)  volume 

numeric 

56,860,086 

vol alarm 

Alarm  sound  volume 

numeric 

56,860,086 

vol voicecall 

Voice  call  sound  volume 

numeric 

56,860,086 

vol system 

System  sound  volume 

numeric 

56,860,086 

vol ring 

Ringtone  sound  volume 

numeric 

56,860,086 

vol noti 

Notification  sound  volume 

numeric 

56,860,086 

accel 

Acceleration  force 

numeric 

29,118,563 

light 

Ambient  light  level 

numeric 

23,184,758 

sms unread cnt 

Number  of  unread  SMS 

numeric 

4,176,962 

airplane 

Airplane  mode  on/off 

binary 

8,079,408 

screen 

Screen  on/off 

binary 

36,064,991 

headset 

Headset  mode  on/off 

binary 

1,526,204 

wifi 

WiFi  mode  on/off 

binary 

2,028,945 

wifi conn 

WiFi  connectivity 

binary 

4,483,934 

mobile conn 

Mobile  connectivity 

binary 

6,290,799 

bluetooth 

Bluetooth  on/off 

binary 

229,058 

charge 

Charge  mode  on/off 

binary 

9,695,334 

ringtone 

Ringtone  mode 

nominal 

8,368,493 

charge stat 

Charge  mode  on/off 

nominal 

9,695,334 

display orient 

Display  orientation 

nominal 

9,403,893 

app pkg 

Application  package  name 

nominal 

92,774,207 

app cat 

Application  category 

nominal 

92,774,207 

location 

Location  (LAC,  CID) 

nominal 

60,754,805 

sms 

Message  event 

nominal 

1,334,568 

Among  9,641  users  in  total,  we  extracted  the  users  who  had  sufficiently  many  records  to 
achieve  reliable  results.  Figure  3  shows  the  distribution  of  the  number  of  incoming  calls  for 
each  user,  which  follows  a  power-law  distribution.  We  calculated  the  70th  percentile  in  the 
total  numbers  of  calls  and  chose  the  users  who  had  more  calls  than  the  70th  percentile.  In 
this  way,  we  extracted  907  heavy  users  from  the  entire  set  of  users,  as  indicated  by  the  yellow 
area  in  Figure  3.  They  contributed  70%  of  all  incoming  calls.  In  addition,  their  data  were 
recorded  for  274  days  on  the  average. 


1  https  V/deviceanalyzer. cl. cam.ac.uk/ 
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Number  of  Incoming  Calls 

Figure  3.  Distribution  of  the  number  of  incoming  calls  per  user 
in  the  Device  Analyzer  data  set. 


Ground  Truth 

Since  the  Device  Analyzer  data  set  does  not  contain  experience  sampling  data,  we  treat  call 
availability  [20]  as  interruptibility.  Alternative  to  the  ESM,  this  implicit  labeling  involves 
observing  user  actions  and  making  deductions,  just  like  Pielot  et  al.  [21]  did  using  notification 
dismissal.  Our  labeling  is  reasonable  because  users  in  non-interruptible  circumstances  cannot 
or  do  not  pick  up  incoming  calls  because  of  unavoidable,  enforced,  intentional,  or  negligent 
unavailability  [23].  We  excluded  call-related  attributes  from  prediction  since  they  directly 
indicate  interruptibility.  A  user  is  regarded  as  being  interruptible  when  he/she  picks  up  an 
incoming  call  and  continues  the  call  for  at  least  ten  seconds.  In  contrast,  a  user  is  regarded 
as  being  not  interruptible  when  he/she  does  not  pick  up  an  incoming  call  or  quits  the  call 
within  just  ten  seconds. 


Detailed  Statistics 

In  order  to  examine  the  data  sets  at  fine  granularity,  we  divide  a  day  into  six  equi-width 
timeslots  as  in  Figure  4. 


Dawn  Morning  Lunch  Afternoon  Dinner  Night 

I - 1 - 1 - 1 - 1 - 1 - 1 

6  a.m.  9  a.m.  12  p.m.  3  p.m.  6  p.m.  9  p.m.  12  a.m. 

Figure  4.  Six  timeslots  in  a  day. 

Figure  5  shows  the  number  of  interruptibility  labels  (i.e.,  interruptible  or  non-interruptible)  per 
user  in  each  timeslot  through  the  experimental  period.  The  median  numbers  range  between 
32  and  47  in  Figure  5(a)  and  between  260  and  298  in  Figure  5(b).  While  the  KAIST  data  set 
has  a  sufficient  number  of  labels,  the  Device  Analyzer  data  set  has  a  significantly  larger  number 
of  labels. 
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Figure  5.  Number  of  interruptibility  labels  in  each  timeslot. 
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Figure  6  shows  what  proportion  of  the  labels  are  interruptible  or  not  in  each  timeslot.  In  Figure 
6(a),  41.2%  of  the  labels  indicate  being  interruptible,  and  58.8%  of  the  labels  indicate  being 
not  interruptible.  In  Figure  6(b),  the  corresponding  proportions  are  55%  and  45%  respectively. 
The  proportion  of  being  interruptible  is  higher  in  Figure  6(b)  than  in  Figure  6(a),  because 
talking  over  the  phone  is  easier  than  accepting  a  simple  task.  We  note  that  the  two  label 
values  tend  to  be  balanced  across  all  timeslots  in  both  data  sets. 


■  Interruptible  ■  Not  Interruptible 

Morning 
Lunch 
Afternoon 
Dinner 
Overall 


■  Interruptible  ■  Not  Interruptible 


0%  25%  50%  75%  100%  0%  25%  50%  75%  100% 

Figure  6.  Proportion  of  interruptibility  label  values  in  each  timeslot. 


METHODOLOGY 

In  this  section,  we  propose  our  methodology  of  extracting  daily  features  and  modeling  the 
interruptibility  using  the  extracted  features 

Temporal  Windows 

First  of  all,  in  order  to  answer  RQ1-RQ4,  we  define  three  types  of  temporal  windows  in  Figure 
7  and  consider  them  together  with  the  current  point.  In  Definition  1,  we  clarify  the  source  of 
a  feature  depending  on  whether  it  is  extracted  from  the  current  point  or  a  temporal  window. 

•  Current  point:  the  current  moment  when  interruptibility  needs  to  be  predicted 

•  Immediate-past  window:  the  interval  from  the  current  point  back  to  15  minutes  before 

•  Today  window:  the  interval  from  the  current  point  back  to  the  beginning  of  the  current 
day 

•  Yesterday  window  (or  the-day-before-yesterday  window):  the  interval  from  the  end 
of  the  latest  previous  day  (or  the  second-latest  previous  day)  back  to  the  beginning  of  the 
latest  previous  day  (or  the  second-latest  previous  day) 

Day-before-  Yesterday 


Figure  7.  Temporal  windows  used  for  feature  extraction. 


We  define  the  basic  features  as  those  extracted  from  the  current  point  in  Figure  7,  and  the 
extended  features  as  those  extracted  from  a  temporal  window  in  Figure  7.  We  specifically  call 
the  extended  features  from  the  today  window  the  daily  features. 
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Extended  Features 

To  cover  various  data  sources,  we  categorize  attributes  (variables)  into  three  types:  numeric, 
binary,  and  nominal  attributes,  as  shown  in  Figure  8.  Since  each  attribute  type  has  distinct 
characteristics,  we  define  the  extended  features  separately  for  each  type  so  that  they  best 
represent  the  attribute  values  of  the  type  in  a  given  temporal  window. 


|  Temporal 
Window 

\  Temporal  j 
Window 

|  duration 

valj  j  ! — ; 

\Cllf  1  '  1 

val3 

Figure  8.  Three  types  of  the  attributes  in  the  interruptibility  data  sets. 


Overview 

Table  4  shows  the  list  of  extended  features  for  each  attribute  type,  complementary  to  Figure 
8.  For  numeric  attributes,  the  mean  and  standard  deviation  are  calculated  to  represent  the 
central  tendency  and  dispersion  of  the  values  in  a  temporal  window;  in  addition,  a  discrete 
wavelet  transform  (DWT)  is  applied  to  capture  the  general  trend  (i.e.,  shape)  in  a  given 
window,  which  will  be  discussed  in  detail.  For  (asymmetric)  binary  attributes,  since  a  positive 
value  is  more  important,  we  keep  the  duration  of  "1"  samples  and  the  number  of  transitions 
from  "0"  to  "1"  in  a  temporal  window.  Since  a  nominal  attribute  is  a  generalization  of  a  binary 
attribute,  we  keep  such  duration  and  number  for  each  possible  value. 


Table  4.  Extended  features  derived  from  a  temporal  window. 


Measure 

Description 

Numeric  Attributes  (Figure  8(a)) 

mean 

the  mean  of  the  samples 

std 

the  standard  deviation  of  the  samples 

dwt 

the  32  DWT  coefficients  of  the  samples 

Binary  Attributes  (Figure  8(b)) 

dur 

the  sum  of  the  duration  of  "1"  samples 

num 

the  total  number  of  transitions  to  "1" 

Nominal  Attributes  (Figure  8(c)) 

val;dur 

the  sum  of  the  duration  of  "vali"  samples 

vali_num 

the  total  number  of  transitions  to  "vali" 

While  the  current  point  and  the  immediate-past  window  are  considered  as  atomic  units,  the 
today  window,  the  yesterday  window,  and  the  day-before-yesterday  window  are  partitioned 
into  six  timeslots  —  dawn,  morning,  lunch,  afternoon,  dinner,  and  night  —  according  to  Figure 
4  before  deriving  extended  features.  For  the  today  window,  the  timeslot  to  which  the  current 
point  belongs  is  considered  partially  up  to  the  present  time.  For  example,  if  the  present  time 
is  8  p.m.,  the  timeslot  dinner  spans  from  6  p.m.  to  8  p.m.  (not  9  p.m.).  The  goal  of  this 
partition  is  to  shorten  the  length  of  a  temporal  window  such  that  each  interval  has  coherent 
semantics,  because  considering  a  too  long  interval  as  a  whole  may  lose  important  information 

We  now  summarize  how  an  extended  feature  is  constructed  in  Figure  9.  If  a  temporal  window 
is  immediate-past,  the  measures  except  dwt  in  Table  4  are  calculated  for  the  window. 
Otherwise,  a  window  is  split  into  timeslots,  and  then  the  measures  are  calculated  for  each 
timeslot.  A  feature  name  is  denoted  by  concatenating  the  names  of  an  attribute,  a  temporal 
window,  a  timeslot,  and  a  measure,  e.g.,  cpu_today_lunch_std. 
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Basic  Feature  M  curr 


( <Attribute>  ■ 

Extended  Feature  */<Temporal  Window>) 


Immediate-Past 


Otherwise 


<Timeslot> 


(<Measure>  ) 


Figure  9.  Notation  and  composition  of  extended  features. 


Discrete  Wavelet  Transform  (DWT)  Features 

The  DWT  has  been  widely  used  for  compression  and  dimensionality  reduction  owing  to  its 
capability  of  capturing  the  major  trends  of  underlying  data.  It  decomposes  an  input  sequence 
into  a  set  of  wavelets  and  produces  a  set  of  coefficients  of  the  same  size  as  the  sequence. 
The  Haar  wavelet  [2],  which  is  very  simple  yet  effective,  is  adopted  in  this  work.  The 
coefficients,  as  going  from  the  first  to  the  last,  indicate  the  frequency  of  a  finer  temporal 
domain.  A  key  advantage  over  other  transforms  (e.g.,  Fourier  transforms)  is  temporal 
resolution  that  captures  location  in  time  as  well  as  frequency,  which  is  essential  for  our 
problem  since  the  time  when  an  event  happened  should  be  preserved.  Before  going  into  the 
details,  we  present  a  motivating  example  for  the  DWT. 

Motivation  Example:  Figure  10  shows  the  values  of  accel_x  in  the  KAIST  data  set.  Between 
the  two  different  timeslots  6  of  Figures  10(a)  and  10(b),  the  mean,  which  is  denoted  by  the 
red  dashed  line,  and  the  standard  deviation  are  almost  the  same,  although  the  shapes  are 
very  different  from  each  other.  User  22  almost  did  not  move  on  February  14  (Figure  10(a)), 
whereas  User  22  frequently  moved  on  February  24  (Figure  10(b)).  In  fact,  the  interruptibility 
at  the  dinner  on  February  14  was  different  from  that  at  the  dinner  on  February  24.  Thus,  the 
temporal  shape  is  related  to  interruptibility,  and  the  DWT  is  needed  to  capture  the  temporal 
shape  that  can  be  characterized  neither  by  the  mean  nor  by  the  standard  deviation. 
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Figure  10.  A  motivating  example  for  using  the  DWT. 


We  derive  DWT  coefficients  for  a  sequence  generated  from  each  3-hour  timeslot  covered  by 
a  temporal  window.  We  do  not  apply  the  DWT  to  the  immediate-past  window  since  its  duration 
is  not  long  enough.  First,  a  sequence  of  length  180  is  constructed  by  the  values  at  every 
minute.  If  a  value  does  not  exist,  it  is  estimated  by  linear  interpolation  between  two 
consecutive  timestamps.  Then,  we  pad  zeros  on  the  right  side  of  the  sequence  to  make  its 
length  256  because  the  DWT  is  defined  for  the  sequences  with  length  of  a  power  of  2.  Last, 
after  applying  the  DWT  to  the  input  sequence,  only  the  first  32  coefficients  are  selected  for 
dimensionality  reduction,  and  such  an  approach  is  widely  accepted  when  leveraging  DWT 
coefficients  as  an  attribute  [3,  30].  Figure  11  shows  the  sequences  obtained  by  restoring  those 
sequences  in  Figure  10  with  the  32  coefficients.  For  both  sequences  in  Figures  11(a)  and  11(b), 
the  shape  of  the  restored  sequence  in  red  is  very  close  to  that  of  the  original  sequence  in  blue. 
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Figure  11.  Reconstruction  using  the  first  32  Haar  wavelet  coefficients. 


Feature  Configurations 

Table  5  illustrates  the  definitions  of  the  seven  feature  configurations  subsequently  used  in 
this  project.  CURR  takes  account  of  the  current  point  only.  IPAST  expands  feature  extraction 
to  the  immediate  past.  DAY[]  takes  advantage  of  the  features  constructed  from  a  long 
duration  in  addition  to  those  used  by  IPAST;  0  indicates  the  current  day,  -1  one  day  before 
that  day,  and  -2  two  days  before  that  day;  a  colon  denotes  an  inclusive  range.  For  example, 
DAY[0]  takes  account  of  the  current  point,  the  immediate-past  window,  and  the  today 
window. 


Table  5.  Feature  configurations  used  in  this  project. 


^'''\l'ime 

Conf.^^\ 

D-b- 

Yesterday 

Yesterday 

Today 

Imm-Past 

Current 

Curr 

IPAST 

DAY[0] 

DAY[-1:0] 

DAY[-2:0] 

DAY[-1] 

DAY]-2] 

For  example,  if  we  consider  a  numeric  attribute  cpu  and  current  point  appears  in  the  timeslot 
night,  the  configuration  DAY[0]  produces  21  features  in  total,  as  shown  in  Figure  12. 
Concatenation  of  the  nodes  by  following  arrows  from  the  root  to  a  leaf  composes  a  feature. 


(  cpu  ) 


Figure  12.  List  of  all  features  for  the  attribute  cpu  in  DAY[0]. 
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EVALUATION 

In  this  section,  we  report  the  results  of  a  series  of  experiments  designed  to  answer  each 
research  question. 

Experimental  Setting 

Data  Preprocessing 

We  improved  the  quality  of  the  raw  data  by  preprocessing.  First,  numeric  attribute  values 
were  normalized  to  between  0  and  1  by  min-max  normalization  and  then  discretized  by  the 
MLDPC  method  [7]  that  determines  the  optimal  cut  points  by  supervised  learning.  Second,  the 
instance  timestamps  were  made  the  same  across  all  attributes  of  a  user.  A  missing  value  at  a 
certain  timestamp  was  estimated  by  linear  interpolation  for  numeric  attributes  and  by  forward 
filling,  which  uses  the  immediately  previous  value,  for  binary  and  nominal  attributes.  Third,  if 
there  were  too  many  possible  values  in  a  nominal  attribute  (e.g.,  location  identifiers  and 
application  names),  we  selected  the  most  frequent  10  values  and  grouped  all  other  infrequent 
values  into  a  single  value. 

Feature  Selection  and  Prediction 

Regarding  feature  selection,  we  used  the  correlation-based  feature  selection  (CFS)  [10] 
method  implemented  in  Weka.  The  CFS  method  selects  a  subset  of  features  that  are  highly 
correlated  with  the  class  while  having  low  intercorrelation.  Regarding  prediction,  we  used  four 
classification  methods:  naive  Bayes  classifier  (NB),  support  vector  machine  (SVM),  random 
forest  (RF),  and  C4.5  decision  tree  (C4.5).  We  present  only  the  results  of  naive  Bayes 
classifiers  because  of  its  highest  accuracy. 

Compared  Methods 

We  compared  the  seven  feature  configurations  in  Table  5. 

•  Baseline  (CURR):  corresponding  to  earlier  work  (e.g.,  [15])  that  uses  only  the  present¬ 
time  features 

•  State-of-the-art  (IPAST):  corresponding  to  recent  work  (e.g.,  [8,  14,  24,  25])  that  uses 
the  immediate-past  features  as  well 

•  Proposed  methodology  (DAY[0]):  using  the  daily  features  as  well 

•  Variation  (DAY[-1:0],  DAY[-2:0],  DAY[-1],  DAY[-2]):  using  the  data  of  one  or  two 

days  ago 

Data  Sets 

We  used  the  data  on  all  days  for  RQ1  and  RQ2,  but  we  used  the  data  only  on  Wednesday, 
Thursday,  and  Friday  for  RQ3  and  RQ4.  When  we  address  RQ3  and  RQ4,  since  the  yesterday 
and  the-day-before-yesterday  windows  are  additionally  considered,  we  want  to  make  sure 
that  all  temporal  windows  span  through  weekdays  in  order  to  avoid  a  possible  bias  between 
weekdays  and  weekends.  In  addition,  the  timeslot  night  was  not  used  for  prediction  in  the 
KAIST  data  set  owing  to  the  lack  of  the  ground  truth,  whereas  it  was  used  in  the  Device 
Analyzer  data  set. 

Table  6  shows  the  total  number  of  features  extracted  by  each  configuration  when  the  current 
point  belongs  to  dinner.  Only  the  number  of  DAY[0]  is  affected  by  the  current  point  since  it 
includes  the  timeslots  up  until  that  point,  whereas  those  of  the  other  configurations  are  not. 
The  number  of  features  increase  as  the  duration  used  for  feature  extraction  gets  longer. 
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Table  6.  Number  of  features  used  for  prediction  in  dinner. 


Set 

Conf. 

KAIST 

(Dinner) 

Device  Analyzer 
(dinner) 

CURR 

70 

71 

IPAST 

195 

199 

DAY[0] 

2,420 

2,599 

DAY[-1],  DAY[-2] 

2,865 

3,079 

DAY[-1:0] 

5,090 

5,479 

DAY[-2:0] 

7,760 

8,359 

Measurement 

We  built  a  personalized  classification  (prediction)  model  for  each  person  using  his/her  own 
data  only.  Then,  we  measured  the  accuracy  in  Eq.  (1)  by  5-fold  cross  validation  for  the 
relatively  small  KAIST  data  set  and  10-fold  cross  validation  for  the  Device  Analyzer  data  set. 
In  Eq.  (1),  TP,  FP,  FN,  and  TN  indicate  the  numbers  of  true  positives,  false  positives,  false 
negatives,  and  true  negatives,  respectively.  Last,  we  reported  the  average  of  the  accuracy 
values  from  all  users. 


TP  +  TN 

accuracy  =  tp  +  fp  +  fn  +  tn  (1) 

The  significance  of  the  difference  between  accuracy  values  was  tested  for  all  pairs  of 
configurations  using  the  two-tailed  t-test.  The  t-test  was  performed  on  overall  accuracy,  i.e., 
the  average  of  the  accuracy  values  on  all  timeslots.  The  results  are  summarized  in  Tables  7 
and  8,  where  the  p-value  and  the  number  (N)  of  samples  used  for  each  test  are  presented. 
The  number  of  asterisks  denotes  the  statistical  significance.  The  t-test  results  that  correspond 
to  RQ1,  RQ2,  RQ3,  and  RQ4  are  indicated  in  colors  of  orange,  yellow,  green,  and  blue, 
respectively. 


Table  7.  T-test  results  for  the  KAIST  data  set. 


NS:  p  >  0.05,  *:  p<0.05,  **:  p<0.01,  ***:  p<0.001,  ****:  p<0.0001 


CURR 

IPAST 

DAY[0] 

DAY[-1:0] 

DAY  [-2:0] 

DAY[-1] 

DAY[-2] 

CURR 

P-value 

N 

IPAST 

P-value 

0.0002*** 

N  R( 

U  25 

DAY[0] 

P-value 

0**** 

0**** 

N  R( 

12  25 

25 

DAY[-1:0] 

P-value 

o**** 

0.0109* 

NS 

N 

24 

24 

24 

DAY  [-2:0] 

P-value 

O**** 

0.0109* 

NS 

NS 

N 

24 

24  R( 

13  24 

24 

DAY[-1] 

P-value 

0**** 

NS 

0.018* 

0.018* 

0.018* 

N 

24 

24 

24 

24 

24 

DAY  [-2] 

P-value 

Q**** 

NS 

0.018* 

0.018* 

0.018* 

NS 

N 

24 

24  R< 

14  24 

24 

24 

24 
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Table  8.  T-test  results  for  the  Device  Analyzer  data  set. 

NS:  p  >  0.05,  *:  p<0.05,  **:  p<0.01,  ***:  p<0.001,  ****:  p<0.0001 


CURR 

IPAST 

DAY  [01 

DAY[-1:01 

DAY  [-2: 01 

DAYf-11 

DAY[-21 

CURR 

P-value 

N 

IPAST 

P-value 

Q**** 

N  R( 

*1907 

DAY[0] 

P-value 

Q**** 

Q**** 

N  R( 

}2  907 

907 

DAY[-1:0] 

P-value 

0**** 

Q**** 

0.0487* 

N 

883 

883 

883 

DAY[-2:0] 

P-value 

0**** 

0.008** 

NS 

N 

883 

883  R< 

CO 

OC 

OO 

883 

DAY[-1] 

P-value 

0**** 

0**** 

0.0128* 

0**** 

0**** 

N 

883 

883 

883 

883 

883 

DAY  [-2] 

P-value 

Q**** 

Q**** 

0.0212* 

Q**** 

0*** 

NS 

N 

883 

883  R( 

OO 

OO 

Lk) 

883 

883 

883 

RQ1  and  RQ2:  Daily  Features 

Figure  13  shows  the  accuracy  calculated  based  on  different  features  to  address  RQ1  and  RQ2 
for  both  data  sets.  Error  bars  indicate  the  standard  error. 
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Figure  13.  Accuracy  based  on  different  features  to  address  RQ1  and  RQ2 

In  both  data  sets,  DAY[0]  achieved  the  highest  accuracy,  followed  by  IPAST  and  CURR,  for  all 
timeslots.  In  addition,  as  the  orange  and  yellow  areas  in  Tables  7  and  8  which  correspond  to 
RQ1  and  RQ2  respectively  show,  there  are  statistically  significant  differences  between  CURR 
and  IPAST  and  between  IPAST  and  DAY[0].  Table  9  shows  the  results  of  all  four  classifiers  for 
the  experiment  in  Figure  13,  where  the  colored  cells  correspond  to  the  values  in  the  plot.  Here, 
there  was  no  big  difference  among  the  classifiers. 
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Table  9.  Accuracy  results  for  RQ1  and  RQ2  with  all  four  classifiers. 


Conf. 

Accuracy  (KAIST) 

Accuracy  (Device  Analyzer) 

NB 

SVM 

RF 

C4.5 

NB 

SVM 

RF 

|  C4.5 

Morning 

CURR 

0.82 

0.75 

0.76 

0.77 

0.78 

0.77 

0.77 

0.77 

IPAST 

0.91 

0.86 

0.87 

0.84 

0.82 

0.80 

0.81 

0.80 

DAY[0] 

0.95 

0.91 

0.91 

0.84 

0.83 

0.82 

0.82 

0.82 

Lunch 

CURR 

0.77 

0.71 

0.70 

0.74 

0.78 

0.76 

0.77 

0.77 

IPAST 

0.83 

0.79 

0.79 

0.79 

0.81 

0.79 

0.80 

0.80 

DAY[0] 

0.90 

0.87 

0.86 

0.82 

0.83 

0.81 

0.82 

0.81 

Afternoon 

CURR 

0.78 

0.73 

0.72 

0.74 

0.78 

0.76 

0.77 

0.77 

IPAST 

0.83 

0.81 

0.7 

0.78 

0.81 

0.79 

0.80 

0.80 

DAY[0] 

0.91 

0.87 

0.88 

0.83 

0.83 

0.81 

0.82 

0.81 

Dinner 

CURR 

0.78 

0.73 

0.71 

0.75 

0.77 

0.76 

0.76 

0.76 

IPAST 

0.84 

0.79 

0.79 

0.79 

0.80 

0.79 

0.80 

0.79 

DAY[0] 

0.91 

0.89 

0.89 

0.84 

0.83 

0.81 

0.82 

0.81 

Night 

CURR 

- 

- 

- 

- 

0.78 

0.76 

0.76 

0.76 

IPAST 

- 

- 

- 

- 

0.81 

0.79 

0.80 

0.79 

DAY[0] 

- 

- 

- 

- 

0.85 

0.84 

0.84 

0.82 

Overall 

CURR 

0.79 

0.73 

0.73 

0.75 

0.78 

0.76 

0.77 

0.77 

IPAST 

0.85 

0.81 

0.81 

0.80 

0.81 

0.79 

0.80 

0.80 

DAY[0] 

0.92 

0.89 

0.88 

0.83 

0.84 

0.82 

0.82 

0.81 

Figure  14  shows  the  top-15  discriminative  features  for  DAY[0].  Here,  the  importance  of  a 
feature  is  determined  by  the  number  of  users  whose  model  still  contains  it  after  feature 
selection.  We  show  the  results  only  for  the  last  timeslot  —  dinner  for  the  KAIST  data  set  and 
night  for  the  Device  Analyzer  data  set  —  to  avoid  redundancy  since  we  found  a  persistent 
consistency  among  all  timeslots.  As  shown  in  Figure  14,  the  features  from  the  current  point 
and  the  immediate-past  window  were  ranked  the  first  and  the  second  respectively.  However, 
the  other  features  were  mostly  extracted  from  the  today  window.  Interestingly,  even  though 
we  predicted  the  interruptibility  for  the  last  timeslot,  many  of  these  "today-window"  features 
came  from  earlier  timeslots  (even  including  dawn):  6  out  of  9  in  the  KAIST  data  set  and  10 
out  of  12  in  the  Device  Analyzer  data  set.  This  indeed  confirms  our  claim  that  the  behaviors 
in  the  previous  several  hours  affect  the  current  interruptibility. 
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■  Current  Point  ■  Imm-Past  Window  ■  Today  Window  ■  Temporal 


Feature 

(a)  KAIST  data  set  (for  dinner). 


Current  Point  ■  Imm-Past  Window  ■  Today  Window  ■  Temporal 
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(b)  Device  Analyzer  data  set  (for  night). 

Figure  14.  Top-15  discriminative  features  in  DAY[0]. 

In  Figure  14,  we  observe  that  many  of  the  discriminative  features  for  the  KAIST  data  set  are 
accelerometer-related  ones  and  those  for  the  Device  Analyzer  data  set  are  screen  or  battery- 
related  ones,  all  of  which  are  closely  related  to  movement  or  usage  of  smartphones.  This 
result  on  important  sensor  categories  is  consistent  with  Dey  et  al.  [6]'s  work. 

In  conclusion,  although  the  accuracy  achieved  by  using  only  the  basic  features  —  79%  in  the 
KAIST  data  set  and  77%  in  the  Device  Analyzer  data  set  in  overall  —  is  also  acceptable,  we 
can  even  increase  the  accuracy  by  leveraging  the  daily  features. 


RQ3:  Temporal  Window  Length 

Figure  15  shows  the  accuracy  calculated  based  on  different  features  to  address  RQ3  for  both 
data  sets. 
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■  DAY[0]  ■  DAY[-1:0]  DAY[-2:0] 


Morning  Lunch  Afternoon  Dinner  Overall 
(a)  KAIST  data  set. 


Morning  Lunch  Afternoon  Dinner  Night  Overall 
(b)  Device  Analyzer  data  set. 

Figure  15.  Accuracy  based  on  different  features  to  address  RQ3. 


Prior  to  feature  selection,  DAY[-1:0]  and  DAY[-2:0]  produce  more  (about  twice  or  three  times) 
features  than  DAY[0]  because  additional  timeslots  are  considered  in  the  yesterday  and  the- 
day-before-yesterday  windows,  as  shown  in  Table  6.  Despite  a  larger  number  of  features, 
however,  we  did  not  observe  significant  increases  in  accuracy  for  DAY[-1:0]  and  DAY[-2:0] 
compared  with  DAY[0].  In  particular,  there  was  almost  no  increase  in  accuracy  in  the  KAIST 
data  set  on  all  timeslots.  On  the  other  hand,  in  the  Device  Analyzer  data  set,  the  accuracy  for 
DAY[-1:0]  or  DAY[-2:0]  was  slightly  higher  than  that  for  DAY[0].  However,  this  increase  is  not 
statistically  significant  at  the  significance  level  of  0.01,  as  shown  in  Table  8.  In  conclusion, 
looking  further  back  beyond  the  current  day  is  not  very  helpful  for  increasing  the  prediction 
accuracy  of  interruptibility. 


RQ4:  Daily  Routineness 

Figure  16  shows  the  accuracy  calculated  based  on  different  features  to  address  RQ4  for  both 
data  sets. 


Morning  Lunch  Afternoon  Dinner  Overall 
(a)  KAIST  data  set. 
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0.90 


■  Day[0]  ■  Day[- 1  ]  Day[-2] 


(b)  Device  Analyzer  data  set. 

Figure  16.  Accuracy  based  on  different  features  to  address  RQ4. 

It  was  observed  that  the  accuracy  values  for  both  DAY[-1]  and  DAY[-2]  were  slightly  lower 
than  that  for  DAY[0].  This  implies  that  the  data  from  the  current  day  is  more  helpful  to  predict 
interruptibility  than  the  data  from  the  latest  (or  second-latest)  previous  day  in  spite  of  the 
repetitive  daily  patterns.  However,  the  decrease  in  accuracy  from  DAY[0]  to  DAY[-1]  or 
DAY[-2]  is  not  statistically  significant  at  the  significance  level  of  0.01,  as  shown  in  Table  7.  In 
conclusion,  the  data  from  the  latest  (or  second -latest)  previous  day  can  be  a  good  substitute 
when  the  model  suffers  from  the  lack  of  the  data  from  the  current  day. 


CONCLUSION 

We  proposed  a  feature  extraction  methodology  for  interruptibility  prediction  using  smartphone 
usage  data.  We  conducted  a  field  study  and  performed  extensive  experiments  on  two  real- 
world  data  sets.  Our  methodology  of  looking  back  on  the  current  day  achieved  the  accuracy 
of  over  90%,  being  higher  than  the  baseline  and  state-of-the  art  methods  by  up  to  13%  and 
8%  respectively.  The  improvement  was  attributed  to  the  fact  that  daily  behavioral  features 
were  included  in  the  predictive  features  of  many  users.  We  also  found  out  that  looking  further 
back  beyond  the  current  day  did  not  improve  accuracy  owing  to  the  daily  routineness  of  human 
behaviors.  We,  thus,  confirmed  that  a  day's  behavior  is  replaceable  with  another  day's 
behavior  for  the  same  reason.  We  believe  that  smartphone  applications  benefiting  from  our 
methodology  will  improve  communication  efficiency  dramatically,  based  on  a  better 
understanding  of  when  and  how  to  engage  with  users.  A  potential  application  scenario  that 
we  envision  is  the  real-time  mobile  Q&A  service.  When  a  user  asks  a  question  on  such  a 
smartphone  application,  the  question  is  delivered  to  a  set  of  expert  users;  when  some  of  them 
answer  the  question,  the  answers  are  immediately  delivered  to  the  questioner.  Thus,  the 
success  of  this  service  depends  upon  selection  of  expert  users  who  are  interruptible  at  that 
moment.  While  this  work  first  proved  the  feasibility  of  exploiting  daily  features,  as  the  future 
work  we  plan  to  further  improve  prediction  accuracy  by  inventing  new  types  of  extended 
features. 
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